Morphological parsing of Swahili using crowdsourced lexical resources
نویسندگان
چکیده
We describe a morphological analyzer for the Swahili language, written in an extension of XFST/LEXC intended for the easy declaration of morphophonological patterns and importation of lexical resources. Our analyzer was supplemented extensively with data from the Kamusi Project (kamusi.org), a user-contributed multilingual dictionary. Making use of this resource allowed us to achieve wide lexical coverage quickly, but the heterogeneous nature of user-contributed content also poses some challenges when adapting it for use in an expert system.
منابع مشابه
Morphological Parsing of Tone: An Experiment with Two-Level Morphology on the Ha Language
Morphological parsers are typically developed for languages without contrastive tonal systems. Ha, a typical Bantu language of Western Tanzania, proposes a challenge to these parses with both lexical and grammatical pitch-accent that would, in order to describe the tonal phenomena, seem to require an approach with a separate level for the tones. However, since the Two-Level Morphology (Koskenni...
متن کاملDisambiguation of morphological analysis in Bantu languages
The paper describes problems in disambiguating the morphological analysis of Bantu languages by using Swahili as a test language. The main factors of ambiguity in this language group can be traced to the noun class structure on one hand and to the bi-directional word-formation on the other. In analyzing word-forms, the system applied utilizes SWATWOL, a morphological parsing program based on tw...
متن کاملCompound words and structure in the lexicon
The structure of lexical entries and the status of lexical decomposition remain controversial. In the psycholinguistic literature, one aspect of this debate concerns the psychological reality of the morphological complexity difference between compound words (teacup) and single words (crescent). The present study investigates morphological decomposition in compound words using visual lexical dec...
متن کاملThe Lefff, a Freely Available and Large-coverage Morphological and Syntactic Lexicon for French
In this paper, we introduce the Lefff , a freely available, accurate and large-coverage morphological and syntactic lexicon for French, used in many NLP tools such as large-coverage parsers. We first describe Alexina, the lexical framework in which the Lefff is developed as well as the linguistic notions and formalisms it is based on. Next, we describe the various sources of lexical data we use...
متن کاملStructure in the Lexicon
The structure of lexical entries and the status of lexical decomposition remain controversial. In the psycholinguistic literature, one aspect of this debate concerns the psychological reality of the morphological complexity difference between compound words (teacup) and single words (crescent). The present study investigates morphological decomposition in compound words using visual lexical dec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014